Impact of Macroeconomic Factors on CVS Stock Price
CVS Health Corp. is one of the top companies in the healthcare sector, providing a wide range of services including pharmacy, health insurance, and retail clinics. An ARIMAX+ARCH/GARCH model can be used to analyze the stock price behavior of CVS Health Corp.
By using an ARIMAX+ARCH/GARCH model to analyze the stock price behavior of CVS Health Corp., we can gain insights into how macroeconomic factors impact the company’s performance. For example, changes in healthcare policies, government regulations, and interest rates may affect the company’s financial performance and subsequently its stock price. Additionally, demographic trends such as an aging population and the increasing prevalence of chronic diseases may also impact the demand for healthcare services provided by CVS Health Corp. and in turn affect the company’s stock price.
Time series Plot
Code
# get data
options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)
data.info = getSymbols("CVS",src='yahoo', from = '2010-01-01',to = "2023-03-01",auto.assign = FALSE)
data = getSymbols("CVS",src='yahoo', from = '2010-01-01',to = "2023-03-01")
df <- data.frame(Date=index(CVS),coredata(CVS))
# create Bollinger Bands
bbands <- BBands(CVS[,c("CVS.High","CVS.Low","CVS.Close")])
# join and subset data
df <- subset(cbind(df, data.frame(bbands[,1:3])), Date >= "2010-01-01")
# colors column for increasing and decreasing
for (i in 1:length(df[,1])) {
if (df$CVS.Close[i] >= df$CVS.Open[i]) {
df$direction[i] = 'Increasing'
} else {
df$direction[i] = 'Decreasing'
}
}
i <- list(line = list(color = '#AD1457'))
d <- list(line = list(color = '#7F7F7F'))
# plot candlestick chart
fig <- df %>% plot_ly(x = ~Date, type="candlestick",
open = ~CVS.Open, close = ~CVS.Close,
high = ~CVS.High, low = ~CVS.Low, name = "CVS",
increasing = i, decreasing = d)
fig <- fig %>% add_lines(x = ~Date, y = ~up , name = "B Bands",
line = list(color = '#ccc', width = 0.5),
legendgroup = "Bollinger Bands",
hoverinfo = "none", inherit = F)
fig <- fig %>% add_lines(x = ~Date, y = ~dn, name = "B Bands",
line = list(color = '#ccc', width = 0.5),
legendgroup = "Bollinger Bands", inherit = F,
showlegend = FALSE, hoverinfo = "none")
fig <- fig %>% add_lines(x = ~Date, y = ~mavg, name = "Mv Avg",
line = list(color = '#C052B3', width = 0.5),
hoverinfo = "none", inherit = F)
fig <- fig %>% layout(yaxis = list(title = "Price"))
# plot volume bar chart
fig2 <- df
fig2 <- fig2 %>% plot_ly(x=~Date, y=~CVS.Volume, type='bar', name = "CVS Volume",
color = ~direction, colors = c('#AD1457','#7F7F7F'))
fig2 <- fig2 %>% layout(yaxis = list(title = "Volume"))
# create rangeselector buttons
rs <- list(visible = TRUE, x = 0.5, y = -0.055,
xanchor = 'center', yref = 'paper',
font = list(size = 9),
buttons = list(
list(count=1,
label='RESET',
step='all'),
list(count=3,
label='3 YR',
step='year',
stepmode='backward'),
list(count=1,
label='1 YR',
step='year',
stepmode='backward'),
list(count=1,
label='1 MO',
step='month',
stepmode='backward')
))
# subplot with shared x axis
fig <- subplot(fig, fig2, heights = c(0.7,0.2), nrows=2,
shareX = TRUE, titleY = TRUE)
fig <- fig %>% layout(title = paste("CVS Stock Price: January 2010 - March 2023"),
xaxis = list(rangeselector = rs),
legend = list(orientation = 'h', x = 0.5, y = 1,
xanchor = 'center', yref = 'paper',
font = list(size = 10),
bgcolor = 'transparent'))
figCode
log(data.info$`CVS.Adjusted`) %>% diff() %>% chartSeries(theme=chartTheme('white'),up.col='#AD1457')Code
#import the data
gdp <- read.csv("DATA/RAW DATA/gdp-growth.csv")
#change date format
gdp$Date <- as.Date(gdp$DATE , "%m/%d/%Y")
#drop DATE column
gdp <- subset(gdp, select = -c(1))
#export the cleaned data
gdp_clean <- gdp
write.csv(gdp_clean, "DATA/CLEANED DATA/gdp_clean_data.csv", row.names=FALSE)
#plot gdp growth rate
fig <- plot_ly(gdp, x = ~Date, y = ~value, type = 'scatter', mode = 'lines',line = list(color = 'rgb(240, 128, 128)'))
fig <- fig %>% layout(title = "U.S GPD Growth Rate: 2010 - 2022",xaxis = list(title = "Time"),yaxis = list(title ="GDP Growth Rate"))
figCode
#import the data
inflation_rate <- read.csv("DATA/RAW DATA/inflation-rate.csv")
#cleaning the data
#remove unwanted columns
inflation_rate_clean <- subset(inflation_rate, select = -c(1,HALF1,HALF2))
#convert the data to time series data
inflation_data_ts <- ts(as.vector(t(as.matrix(inflation_rate_clean))), start=c(2010,1), end=c(2023,2), frequency=12)
#export the data
write.csv(inflation_rate_clean, "DATA/CLEANED DATA/inflation_rate_clean_data.csv", row.names=FALSE)
#plot inflation rate
fig <- autoplot(inflation_data_ts, ylab = "Inflation Rate", color="#FFA07A")+ggtitle("U.S Inflation Rate: January 2010 - February 2023")+theme_bw()
ggplotly(fig)Code
#import the data
interest_data <- read.csv("DATA/RAW DATA/interest-rate.csv")
#change date format
interest_data$Date <- as.Date(interest_data$Date , "%m/%d/%Y")
#export the cleaned data
interest_clean_data <- interest_data
write.csv(interest_clean_data, "DATA/CLEANED DATA/interest_rate_clean_data.csv", row.names=FALSE)
#plot interest rate
fig <- plot_ly(interest_data, x = ~Date, y = ~value, type = 'scatter', mode = 'lines',line = list(color='rgb(219, 112, 147)'))
fig <- fig %>% layout(title = "U.S Interest Rate: January 2010 - March 2023",xaxis = list(title = "Time"),yaxis = list(title ="Interest Rate"))
figCode
#import the data
unemployment_rate <- read.csv("DATA/RAW DATA/unemployment-rate.csv")
#change date format
unemployment_rate$Date <- as.Date(unemployment_rate$Date , "%m/%d/%Y")
# export the data
write.csv(unemployment_rate, "DATA/CLEANED DATA/unemployment_rate_clean_data.csv", row.names=FALSE)
#plot unemployment rate
fig <- plot_ly(unemployment_rate, x = ~Date, y = ~Value, type = 'scatter', mode = 'lines',line = list(color = 'rgb(189, 183, 107)'))
fig <- fig %>% layout(title = "U.S Unemployment Rate: January 2010 - March 2023",xaxis = list(title = "Time"),yaxis = list(title ="Unemployment Rate"))
figThe company’s stock price has exhibited a similar pattern to that of the broader healthcare sector, with fluctuations that are often driven by changes in healthcare policy, market competition, and macroeconomic conditions.
From 2010 to 2015, CVS’s stock price experienced a steady rise, as the company expanded its retail pharmacy and healthcare services businesses. However, in 2015, CVS announced its plans to acquire health insurer Aetna, which led to some investor uncertainty and caused a temporary dip in the company’s stock price.
Following the acquisition, CVS’s stock price remained relatively stable until early 2020, when the COVID-19 pandemic began to impact the healthcare industry. While the pandemic initially caused some volatility in CVS’s stock price, the company’s strong position in the healthcare sector and its focus on expanding its digital capabilities helped to stabilize its performance.
Since mid-2020, CVS’s stock price has exhibited a generally upward trend, likely due to the company’s efforts to enhance its digital and e-commerce offerings, expand its healthcare services, and streamline its operations. In early 2021, CVS also announced plans to acquire health insurer Centene Corp’s Illinois unit, which could help to further bolster its position in the healthcare industry.
Since early 2021, CVS’s stock price has experienced some volatility, likely due to a combination of factors such as global economic uncertainty and fluctuations in consumer demand for the company’s products.
As discussed before, the macroeconomic factors of GDP growth, inflation, interest rates, and unemployment rate are closely interrelated and play a crucial role in the overall health and stability of an economy. From 2010 to 2023, the global economy experienced a mix of ups and downs, with periods of strong GDP growth followed by slowdowns and recessions.
The second plot shows the first difference of the logarithm of the adjusted CVS stock price. Taking the first difference removes any long-term trends and transforms the time series into a stationary process. From the plot, we can observe that the first difference of the logarithm of the CVS stock price appears to be stationary, as the mean and variance are roughly constant over time.
Enodogenous and Exogenous Variables
Code
numeric_data <- c("CVS.Adjusted","gdp", "interest", "inflation", "unemployment")
numeric_data <- final[, numeric_data]
normalized_data_numeric <- scale(numeric_data)
normalized_data <- ts(normalized_data_numeric, start = c(2010, 1), end = c(2021,10),frequency = 4)
ts_plot(normalized_data,
title = "Normalized Time Series Data for CVS Stock and Macroeconomic Variables",
Ytitle = "Normalized Values",
Xtitle = "Year")Code
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
cormat <- round(cor(normalized_data_numeric),2)
upper_tri <- get_upper_tri(cormat)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
ggheatmap +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("CVS.Adjusted")], normalized_data[, c("gdp")],
lag.max = 300,
main = "Cros-Correlation Plot for CVS Stock Price and GDP Growth Rate ",
ylab = "CCF")Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))The sum of cross correlation function is 6.66753
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("CVS.Adjusted")], normalized_data[, c("interest")],
lag.max = 300,
main = "Cros-Correlation Plot for CVS Stock Price and Interest Rate",
ylab = "CCF")Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))The sum of cross correlation function is 14.04391
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("CVS.Adjusted")], normalized_data[, c("inflation")],
lag.max = 300,
main = "Cros-Correlation Plot for CVS Stock Price and Inflation Rate",
ylab = "CCF")Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))The sum of cross correlation function is 15.34855
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("CVS.Adjusted")], normalized_data[, c("unemployment")],
lag.max = 300,
main = "Cros-Correlation Plot for CVS Stock Priceand Unemployment Rate",
ylab = "CCF")Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))The sum of cross correlation function is 15.40655
The Normalized Time Series Data for Stock Price and Macroeconomic Variables plot shows the same variables as the first plot but has been normalized to a common range of 0 to 1 using the scale() function in R, which standardizes the variables to have a mean of 0 and a standard deviation of 1. The heatmap analysis of the normalized data reveals that inflation and unemployment rate exhibit strong positive correlations with the stock price indices, indicating that these variables may significantly influence stock price movements. On the other hand, weaker correlations were observed between the stock price indices and GDP and interest rates, suggesting that these variables may have less impact on stock price fluctuations. The cross-correlation feature plots confirm these findings, indicating that inflation and unemployment rate are more suitable feature variables for the ARIMAX model when predicting CVS movements.
Final Exogenous variables: Macroeconomic indicators: Inflation rate and unemployment rate.
Enodogenous and Exogenous Variables Plot
Code
final_data <- final %>%dplyr::select( Date,CVS.Adjusted, inflation,unemployment)
numeric_data <- c("CVS.Adjusted", "inflation","unemployment")
numeric_data <- final_data[, numeric_data]
normalized_data_numeric <- scale(numeric_data)
normalized_numeric_df <- data.frame(normalized_data_numeric)
normalized_data_ts <- ts(normalized_data_numeric, start = c(2010, 1), frequency = 4)
autoplot(normalized_data_ts, facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("CVS Stock Price, Inflation Rate and Unemployment Rate in USA 2010-2023")Code
# Convert your multivariate time series data to a matrix
final_data_ts_multivariate <- as.matrix(normalized_data_ts)
# Check for stationarity using Phillips-Perron test
phillips_perron_test <- ur.pp(final_data_ts_multivariate)
summary(phillips_perron_test)
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept
Call:
lm(formula = y ~ y.l1)
Residuals:
Min 1Q Median 3Q Max
-1.9919 -0.1733 -0.0624 0.1315 4.4728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.004471 0.042538 0.105 0.916
y.l1 0.841642 0.042969 19.587 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5296 on 153 degrees of freedom
Multiple R-squared: 0.7149, Adjusted R-squared: 0.713
F-statistic: 383.7 on 1 and 153 DF, p-value: < 2.2e-16
Value of test-statistic, type: Z-alpha is: -25.4391
aux. Z statistics
Z-tau-mu 0.1044
The results of the Phillips-Perron unit root test indicate strong evidence against the null hypothesis of a unit root, as the p-value for the coefficient of the lagged variable is less than the significance level of 0.05. This suggests that the variable y, which is being tested for stationarity, is likely stationary. Furthermore, the test statistic Z-tau-mu is 0.1044, which is smaller than the critical value of Z-alpha (-25.4391), providing further evidence of stationarity.
To determine whether the linear model requires an ARCH model, an ARCH test is conducted. The ACF and PACF plots are also used to identify suitable model values.
Model Fitting
Code
normalized_numeric_df$CVS.Adjusted<-ts(normalized_numeric_df$CVS.Adjusted,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
normalized_numeric_df$inflation<-ts(normalized_numeric_df$inflation,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
normalized_numeric_df$unemployment<-ts(normalized_numeric_df$unemployment,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
fit <- lm(CVS.Adjusted ~ inflation+unemployment, data=normalized_numeric_df)
fit.res<-ts(residuals(fit),star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
############## Then look at the residuals ############
returns <- fit.res %>% diff()
autoplot(returns)+ggtitle("Linear Model Returns")Code
byd.archTest <- ArchTest(fit.res, lags = 1, demean = TRUE)
byd.archTest
ARCH LM-test; Null hypothesis: no ARCH effects
data: fit.res
Chi-squared = 6.232, df = 1, p-value = 0.01255
Code
ggAcf(returns) +ggtitle("ACF for returns")Code
ggPacf(returns) +ggtitle("PACF for returns")The ARCH LM-test was conducted with the null hypothesis of no ARCH effects. The test resulted in a chi-squared value of 6.232 with one degree of freedom, and a low p-value of 0.01255. This suggests strong evidence against the null hypothesis, indicating the presence of ARCH effects in the data.
Based on the ACF and PACF plots, it appears that there is some significant autocorrelation and partial autocorrelation at multiple lags, which suggests that an ARIMA model may not be sufficient to capture the time series behavior. Additionally, the values for p and q appear to be relatively high, with p = 3 and q = 8 being suggested by the plots.
ARIMAX Model
Code
xreg <- cbind(Inflation = normalized_data_ts[, "inflation"],
Unemployment = normalized_data_ts[, "unemployment"])
fit.auto <- auto.arima(normalized_data_ts[, "CVS.Adjusted"], xreg = xreg)
summary(fit.auto)Series: normalized_data_ts[, "CVS.Adjusted"]
Regression with ARIMA(0,1,0)(0,0,1)[4] errors
Coefficients:
sma1 Inflation Unemployment
0.4196 0.2311 -0.0734
s.e. 0.1281 0.1097 0.0511
sigma^2 = 0.07929: log likelihood = -6.57
AIC=21.15 AICc=22.02 BIC=28.88
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.03193475 0.2705407 0.2096642 36.92293 74.53086 0.4005753
ACF1
Training set -0.07923522
Code
checkresiduals(fit.auto)
Ljung-Box test
data: Residuals from Regression with ARIMA(0,1,0)(0,0,1)[4] errors
Q* = 7.2257, df = 7, p-value = 0.4058
Model df: 1. Total lags used: 8
Code
ARIMA.c=function(p1,p2,q1,q2,data){
temp=c()
d=1
i=1
temp= data.frame()
ls=matrix(rep(NA,6*30),nrow=30)
for (p in p1:p2)#
{
for(q in q1:q2)#
{
for(d in 0:1)
{
if(p+d+q<=6)
{
model<- Arima(data,order=c(p,d,q))
ls[i,]= c(p,d,q,model$aic,model$bic,model$aicc)
i=i+1
#print(i)
}
}
}
}
temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")
temp
}
output <- ARIMA.c(1,8,1,2,data=residuals(fit))
output[which.min(output$AIC),] p d q AIC BIC AICc
5 2 0 1 59.21013 68.96635 60.51448
Code
output[which.min(output$BIC),] p d q AIC BIC AICc
2 1 1 1 60.59251 66.38799 61.10315
Code
output[which.min(output$AICc),] p d q AIC BIC AICc
5 2 0 1 59.21013 68.96635 60.51448
Code
set.seed(1234)
model_output <- capture.output(sarima(fit.res, 2,0,1)) Code
cat(model_output[38:69], model_output[length(model_output)], sep = "\n")$fit
Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 ma1 xmean
-0.0494 0.5842 1.0000 -0.0355
s.e. 0.1158 0.1153 0.0813 0.2141
sigma^2 estimated as 0.1426: log likelihood = -24.61, aic = 59.21
$degrees_of_freedom
[1] 48
$ttable
Estimate SE t.value p.value
ar1 -0.0494 0.1158 -0.4267 0.6715
ar2 0.5842 0.1153 5.0647 0.0000
ma1 1.0000 0.0813 12.2957 0.0000
xmean -0.0355 0.2141 -0.1659 0.8690
$AIC
[1] 1.138656
$AICc
[1] 1.155023
$BIC
[1] 1.326276
Code
set.seed(1234)
model_output <- capture.output(sarima(fit.res, 1,1,1)) Code
cat(model_output[40:70], model_output[length(model_output)], sep = "\n")$fit
Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 constant
-0.8170 1.0000 0.0126
s.e. 0.0886 0.0711 0.0625
sigma^2 estimated as 0.1647: log likelihood = -27.28, aic = 62.55
$degrees_of_freedom
[1] 48
$ttable
Estimate SE t.value p.value
ar1 -0.8170 0.0886 -9.2196 0.0000
ma1 1.0000 0.0711 14.0599 0.0000
constant 0.0126 0.0625 0.2024 0.8404
$AIC
[1] 1.226501
$AICc
[1] 1.236513
$BIC
[1] 1.378016
Code
n=length(fit.res)
k= 51
rmse1 <- matrix(NA, (n-k),4)
rmse2 <- matrix(NA, (n-k),4)
rmse3 <- matrix(NA, (n-k),4)
st <- tsp(fit.res)[1]+(k-5)/4
for(i in 1:(n-k))
{
xtrain <- window(fit.res, end=st + i/4)
xtest <- window(fit.res, start=st + (i+1)/4, end=st + (i+4)/4)
#ARIMA(0,1,0) ARIMA(1,1,1)
fit <- Arima(xtrain, order=c(2,0,1),
include.drift=TRUE, method="ML")
fcast <- forecast(fit, h=4)
fit2 <- Arima(xtrain, order=c(1,1,1),
include.drift=TRUE, method="ML")
fcast2 <- forecast(fit2, h=4)
fit3 <- Arima(xtrain, order=c(0,1,0),seasonal=c(0,0,1),
include.drift=TRUE, method="ML")
fcast3 <- forecast(fit2, h=4)
rmse1[i,1:length(xtest)] <- sqrt((fcast$mean-xtest)^2)
rmse2[i,1:length(xtest)] <- sqrt((fcast2$mean-xtest)^2)
rmse3[i,1:length(xtest)] <- sqrt((fcast3$mean-xtest)^2)
}
plot(1:4,colMeans(rmse1,na.rm=TRUE), type="l",col=2, xlab="horizon", ylab="RMSE")
lines(1:4, colMeans(rmse2,na.rm=TRUE), type="l",col=3)
lines(1:4, colMeans(rmse3,na.rm=TRUE), type="l",col=3)
legend("topleft",legend=c("fit1","fit2","fit3"),col=2:4,lty=1)Based on the results of the auto.arima function, the suggested best model is ARIMA(0,1,0)(0,0,1)[4]. However, when we manually test different ARIMA models, we find that ARIMA(1,1,1) has the lowest values for AIC and AICC and (2,0,1) has the lowest BIC value. Additionally,when comparing the models, RMSE values are for ARIMA(1,1,1) and ARIMA(0,1,0)(0,0,1)[4], but the AIC and BIC value is way lesser in ARIMA(1,1,1) than ARIMA(0,1,0)(0,0,1)[4]. So, out best model id ARIMA(1,1,1)
We can then proceed to choose the best GARCH model using ARIMA(1,1,1) as the base model.
Squared Residuals
Code
fit <- lm(CVS.Adjusted ~ inflation+unemployment, data=normalized_numeric_df)
fit.res<-ts(residuals(fit),star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
fit <- Arima(fit.res,order=c(1,1,1))
res=fit$res
plot(res^2,main='Squared Residuals')Code
acf(res^2,24, main = "ACF Residuals Square")Code
pacf(res^2,24, main = "PACF Residuals Square")Code
acf(abs(res), main = "ACF of Absolute Residuals")Code
pacf(abs(res), main = "PACF of Absolute Residuals")Code
summary(garchFit(~garch(1,0),res, trace=F))
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 0), data = res, trace = F)
Mean and Variance Equation:
data ~ garch(1, 0)
<environment: 0x11097ef60>
[data = res]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1
-0.004657 0.074527 0.772521
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.004657 0.046134 -0.101 0.91959
omega 0.074527 0.022857 3.261 0.00111 **
alpha1 0.772521 0.378439 2.041 0.04122 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
-21.43535 normalized: -0.4122182
Description:
Sat Jan 6 19:53:57 2024 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 37.97519 5.672745e-09
Shapiro-Wilk Test R W 0.9085432 0.000724795
Ljung-Box Test R Q(10) 7.703958 0.6577284
Ljung-Box Test R Q(15) 11.08966 0.7462132
Ljung-Box Test R Q(20) 20.53374 0.4250177
Ljung-Box Test R^2 Q(10) 2.098718 0.995526
Ljung-Box Test R^2 Q(15) 3.245988 0.9993468
Ljung-Box Test R^2 Q(20) 4.060657 0.9999474
LM Arch Test R TR^2 3.304628 0.9929844
Information Criterion Statistics:
AIC BIC SIC HQIC
0.9398210 1.0523927 0.9336356 0.9829783
From the squared residuals of the best ARIMA model, it can be observed that the ACF plot and PACF plot indicate that the residuals are not autocorrelated and are white noise, indicating a good fit of the model. Based on the squared residuals of the best ARIMA model, we can see that the ACF and PACF plots indicate that most of the values lie between the blue lines. Additionally, the p-value is 1. This suggests that the model has a good fit and that there is no significant autocorrelation or partial autocorrelation in the residuals.
The best models are ARIMA(1,1,1) and ARCH(1)
Best Model
Code
#fiting an ARIMA model to the Inflation variable
inflation_fit<-auto.arima(normalized_numeric_df$inflation)
finflation<-forecast(inflation_fit)
#fitting an ARIMA model to the Unemployment variable
unemployment_fit<-auto.arima(normalized_numeric_df$unemployment)
funemployment<-forecast(unemployment_fit)
# best model fit for forcasting
xreg <- cbind(Inflation = normalized_data_ts[, "inflation"],
Unemployment = normalized_data_ts[, "unemployment"])
summary(arima.fit<-Arima(normalized_data_ts[, "CVS.Adjusted"],order=c(1,1,1),xreg=xreg),include.drift = TRUE)Series: normalized_data_ts[, "CVS.Adjusted"]
Regression with ARIMA(1,1,1) errors
Coefficients:
ar1 ma1 Inflation Unemployment
-0.3281 0.3754 0.2266 -0.1084
s.e. 0.8664 0.8446 0.1377 0.0637
sigma^2 = 0.09542: log likelihood = -10.37
AIC=30.75 AICc=32.08 BIC=40.4
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.04128572 0.2936735 0.2223189 35.05635 89.53581 0.4247529
ACF1
Training set -0.05055536
Code
summary(final.fit <- garchFit(~garch(1,0), res,trace = F))
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 0), data = res, trace = F)
Mean and Variance Equation:
data ~ garch(1, 0)
<environment: 0x124e42400>
[data = res]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1
-0.004657 0.074527 0.772521
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.004657 0.046134 -0.101 0.91959
omega 0.074527 0.022857 3.261 0.00111 **
alpha1 0.772521 0.378439 2.041 0.04122 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
-21.43535 normalized: -0.4122182
Description:
Sat Jan 6 19:53:58 2024 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 37.97519 5.672745e-09
Shapiro-Wilk Test R W 0.9085432 0.000724795
Ljung-Box Test R Q(10) 7.703958 0.6577284
Ljung-Box Test R Q(15) 11.08966 0.7462132
Ljung-Box Test R Q(20) 20.53374 0.4250177
Ljung-Box Test R^2 Q(10) 2.098718 0.995526
Ljung-Box Test R^2 Q(15) 3.245988 0.9993468
Ljung-Box Test R^2 Q(20) 4.060657 0.9999474
LM Arch Test R TR^2 3.304628 0.9929844
Information Criterion Statistics:
AIC BIC SIC HQIC
0.9398210 1.0523927 0.9336356 0.9829783
Code
ht <- final.fit@h.t #a numeric vector with the conditional variances (h.t = sigma.t^delta)
#############################
data=data.frame(final)
data$Date<-as.Date(data$Date,"%Y-%m-%d")
data2= data.frame(ht,data$Date)
ggplot(data2, aes(y = ht, x = data.Date)) + geom_line(col = '#AD1457') + ylab('Conditional Variance') + xlab('Date')From the ARIMA(1,1,1), we see that the training set error measures also suggest a good fit, with low mean absolute error, root mean squared error, and autocorrelation of the residuals. GATCH(1,0) model model is used to estimate the volatility of the standardized residuals of the previous regression model. The model includes a mean equation that estimates the mean of the residuals and a variance equation that models the conditional variance of the residuals. The coefficients of the mean equation suggest that the mean of the residuals is close to zero. The variance equation coefficients suggest that the conditional variance of the residuals is dependent on the past conditional variances and the past squared standardized residuals. The model’s the AIC, BIC, SIC, and HQIC values are all relatively low, indicating a good fit of the model. The standardized residuals tests indicate that the residuals are approximately normally distributed and that there is no significant autocorrelation in the residuals.
The volatility of the model seems high in 2020 but has decreased gradually in the past few months. This could indicate that the asset’s price was experiencing a lot of fluctuations in 2020, but the market has stabilized recently.
Model Diagnostics
Code
fit2<-garch(res,order=c(1,0),trace=F)
checkresiduals(fit2) Code
qqnorm(fit2$residuals, pch = 1)
qqline(fit2$residuals, col = "blue", lwd = 2)Code
Box.test (fit2$residuals, type = "Ljung")
Box-Ljung test
data: fit2$residuals
X-squared = 0.48064, df = 1, p-value = 0.4881
The ACF plot of the residuals shows all the values between the blue lines, which indicates that the residuals are not significantly autocorrelated. The range of values for the residual plot between -2 and 2 is considered acceptable. Additionally, the QQ plot of the residuals shows a linear plot on the line, which is another good indication that the residuals are normally distributed. The QQ plot is a valuable tool to assess if the residuals follow a normal distribution, and in this case, the plot suggests that the residuals do indeed follow a normal distribution.
The Box-Ljung test, a p-value of 0.8885 indicates that the model’s residuals are not significantly autocorrelated, meaning that the model has captured most of the information in the data. This result is good because it suggests that the model is a good fit for the data and has accounted for most of the underlying patterns in the data. Therefore, we can rely on the model’s predictions and use them to make informed decisions.
Forecast
Code
predict(final.fit, n.ahead = 5, plot=TRUE) meanForecast meanError standardDeviation lowerInterval upperInterval
1 -0.004657039 0.2997059 0.2997059 -0.5920698 0.5827557
2 -0.004657039 0.3793652 0.3793652 -0.7481991 0.7388850
3 -0.004657039 0.4309373 0.4309373 -0.8492786 0.8399645
4 -0.004657039 0.4668938 0.4668938 -0.9197521 0.9104381
5 -0.004657039 0.4928784 0.4928784 -0.9706809 0.9613668
The forecasted plot is based on the best model ARIMAX(1,1,1)+GARCH(1,1). This model takes into account the autoregressive and moving average components of the data, as well as the impact of exogenous variables on the time series. Additionally, the GARCH component of the model accounts for the volatility clustering in the data. Overall, this model is well-suited to make accurate predictions about future values of the time series.
Equation of the Model
The equation of the ARIMAX(1,1,1) model is:
\(Y(t) = c + \phi_1(Y{(t-1)} - X{(t-1)}) + \theta_1\epsilon{(t-1)} + \epsilon(t)\)
where, \(Y(t)\) is the time series variable, \(X(t-1)\) is the exogenous variable, \(c\) is a constant, \(\phi_1\) and \(\theta_1\) are the parameters, and \(\epsilon(t)\) is the error term.
The equation of the ARCH(1) model is:
\(\sigma_{t}^{2}=\alpha_{0}+\alpha_{1}\epsilon_{t-1}^{2}\)
where \(\sigma^2_t\) is the conditional variance at time \(t\), \(\alpha_0\) is a constant, \(\alpha_1\) and \(\beta_1\) are the parameters, and \(\epsilon_t\) is the error term.
The combined equation of the ARIMAX(1,1,1)+GARCH(1,1) model is:
\(Y(t) = c + \phi_1(Y(t-1) - X(t-1)) + \theta_1\epsilon(t-1) + \epsilon(t)\)
\(\sigma_{t}^{2}=\alpha_{0}+\alpha_{1}\epsilon_{t-1}^{2}\)